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1. Introduction 

Over the past years the study of agent-based models of financial markets or other 
phenomena with the tools of statistical mechanics has proved to be exceptionally fruitful. 
Many such models can, in the language of statistical mechanics, be understood as fully 
connected mean-field systems comprising disordered interactions and various types of 
global frustration. They are thus perfectly suited to be addressed with the techniques 
developed originally for other purposes, such as the study of magnetic systems, spin- 
glasses or neural networks. Indeed, both static and dynamical methods, including replica 
techniques and path integrals, have been successfully applied for example to the Minority 
Game (MG) [2], presumably one of the most studied models in this context, and have 
led to an advanced theoretical understanding of the behaviour of various versions of the 
MG [3-5]. On the other hand these studies have also revealed new types of complexity 
and phase transitions, which hitherto had been unknown to statistical physicists. 

The standard versions of the MG describe an ensemble of interacting agents who at 
each time step react to publicly available information by taking trading decisions, e.g. 
to buy or to sell a given asset. While it is crucial in this setting that the information 
made available is identical for all agents, no interaction between the individual traders 
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other than through this global and uniform signal is present in the MG. Furthermore, 
it is essentially of little relevance whether this stream of information is generated 
endogenously by the system or drawn externally at random [6] as long as all agents 
react to the same signal. This suffices to generate a remarkably complex dynamics with 
phenomena including phase transitions, non-ergodic regimes, replica symmetry breaking 
and memory effects . 

The aim of the present paper is to extend the dynamical path-integral formalism 
to the study of models with private, agent-dependent information. These cases play a 
major role in economic theory, especially in view of the connection between asymmetric 
information and the failure of market equilibrium [7]. The model we address here 
was first introduced in [1] and is a close relative of the Shapley-Shubik model of non- 
cooperative trading equilibrium [8]. While the focus of [1] lies mainly on the statics of 
the model using replica techniques, we will here complement this work by an analysis 
of the dynamics based on generating functionals for systems with quenched disorder [9]. 
Although the dynamical analysis presented here parallels that of the MG, the model 
displays some novel features and new types of phases. Our analysis sets the stage for 
further studies addressing subtle issues related to fluctuations and dependence on the 
learning rate, inherently dynamical features which statics is unable to capture. In the 
present paper we will mostly be concerned with the mathematical analysis of the model, 
details of the economic background can be found in [1] and references therein. 



2. Model definitions 



The definition of the model follows closely that given in [1]. One considers a single-asset 
market with N agents labelled by Roman indices. It is assumed that the asset pays 
a monetary return R(£) at the end of each market round £ = 0, 1, 2, . . .. This return 
depends on the value of a discrete variable tv(£), which models the "state of the world" 
and is similar to the global signal made available to agents in MGs. We will here assume 
that uj(£) is determined externally, similar to MGs with randomly drawn exogenous 
information. Specifically, u(£) is selected randomly and independently at each round 
with flat probability distribution from the set {l,...,fi}. The statistical mechanics 
analysis will ultimately be concerned with the thermodynamic limit iV — > oo, which 
is taken in a way such that the relative number of possible states of the world £l/N 
remains finite. The ratio a = Q/N turns out to be the key control parameter of the 
model. The asset return at time £ is then given as R(£) = where the components 

of the vector R = (R 1 , . . . , R n ) are taken to be quenched random variables, drawn at 
the start of the game and then kept fixed. It is assumed that they each take the form 

" + (!) 



where R > is a constant, and where the {R } are independent, identically distributed 
Gaussian random variables with zero mean and variance A 2 > 0. Thus, the Bernoulli 
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process {w(£)}i> induces the time series of asset returns {R(£) = R^}^. R and A 
are additional model parameters. The motivation for the choice (1) will become clear 
in the following. 

Contrary to MGs, traders in this model are unable to observe the state u(£) directly. 
Rather, each of them has access only to a coarse-grained signal on {1, ... ,Q} which 
corresponds to some fixed private information scheme. In particular, the signal observed 
by a given trader i is determined by the vector 

fe i :{l,...,n}9iJ-»fcfG{-l 1 l}. (2) 

The components of any fcj (i — 1, . . . , N) are again assumed to be drawn at random and 
independently from {—1,1} with equal probability for all % and uu at the beginning of the 
game and are kept fixed afterwards. In this way, each trader has a private information 
source providing him with a binary signal k"^ at time i. This private signal does 
depend on the state u(£), but exactly what this state is at time £ is not known to the 
individual agents. Since the {R u } and the {fcj} are drawn independently, the sequence 
{k^}e>o observed by agent % will in general not allow him to tell what return R^ is 
to be expected at time step £. Crucially, however, the correlation between the vectors 
hi and R are heterogeneous across the population of traders, so that different agents 
will have varying abilities to resolve the individual states ou e {1, . . . , Q}. 

At each round £ of the game every agent decides to invest a monetary amount 
z. k ^(t)(£) > which depends on the signal k^ G {—1,1} he receives at stage £. The 
total amount invested by agents at round £ determines the price of the asset: 

1 N 

P^ = nH E ^)<W<>" (3) 

i=l <t6{-1,1} 

It remains to specify how the agents determine the amounts {z ia (£)} they invest. It is 
assumed that traders are adaptive and that their behaviour is governed by an inductive 
learning dynamics. Specifically, each agent has a propensity Ui a (£) to invest under each 
of the two possible signals o e {—1, 1}; these propensities are initialized at values u ia (0) 
at time t = and are updated at the end of every round according to the marginal 
success of the investment: 

u ta (£ + i)=uUi) + r^^ y (4) 

Here T > is a learning ratej, while Qi(£) stands for the payoff received by trader i 
at the end of round £. The {Qi(£)} can be specified upon noting that at each round 
every agent puts forward his bid before p(£) is known and makes profit if the return of 
the asset exceeds its price. The price in turn is determined by the collective action of 
all traders, Eq. (3). Now since agent % acquires z. k w(e)(£)/p(£) units of the asset, each 

| In principle, different agents i might have different learning rates IV We here restrict the discussion 
to the case I\ = T for all i. Our theory can be generalised to populations of agents with heterogeneous 
learning rates. 
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yielding an effective return of R(£) — p(£), the following form for the payoffs is assumed: 

For simplicity we linearise the payoff as in [1] and consider 

Qi(t) [R(e)-p(e)]6 ffjkr v. (6) 

To compute the marginal payoffs , note that p(£) is itself a function of z ia (£), so 
that in principle one has 



In general, traders may not be able to evaluate how much their decision affects the price, 
i.e. naive agents who neglect the impact of their own trading action might ignore the 
second term on the right-hand side, while other so-called sophisticated agents [10] might 
be able to take into account this market-impact correction. We will not allow for any 
heterogeneity across the agents in this respect, but will instead study the dynamics 

u ia {£ + 1) = u ia {£) + T5^ w [R(£) - p{£) - jfZ ia (£)\ . (8) 

Here 77 > is a further model parameter which accounts for the (uniform) ability of the 
agents to estimate the impact of their trading actions on the price. For rj = agents act 
as 'price takers' and neglect the effect of the term when updating the propensities; 

on the other hand for r\ = 1 they fully correct for their impact on the price process. 
Tuning rj G [0, 1] allows to consider agents of increasing sophistication, similar to what 
is done in MGs with market-impact correction. Finally, the propensities {ui a (£)} are 
related to the investments {zi a (£)} made at time step £ via 

z ia {£)=u ic {£)6[u i(T {£)l (9) 

where 9(x) is the step-function, i.e. 9(x) = 1 for x > and 9(x) = otherwise§. 

The model may thus be summarised as follows. At each time step £ a 'state 
of the world' uj{£) G {1,...,Q} is drawn at random. Each agent i then receives a 
signal k"^ G { — 1, 1} corresponding to his private information structure. Given the 
perceived signal he decides to invest an amount z. > according to (9), based 

on the propensity u w(t){£). From the investments of all agents a total price p(£) is 
determined via the market clearing condition (3). Each individual agent i then updates 
the propensity Ui C for the perceived signal a = k"^ according to (8), and leaves the 
propensity for the opposite signal —a unchanged. 

We shall mostly be interested in the stationary state (s) of the model, and in 
particular in the question of whether agents can co-ordinate efficiently even if their 
access to the global signal is filtered through their private information schemes. We will 
refer to 'market efficiency' as the situation in which returns are fully reflected by the 

§ In [1] it is argued that the key features of the model remain unchanged for other choices for the 
map Uia-(£) — u[zia-(£)], as long as this map is increasing and guarantees lim u ^ 00 z(u) = 00 and 
liniu^-oo z(u) = 0. In the simulations presented below the expression given in (9) was used. 
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prices. In order to quantify the efficiency, or otherwise, of the market we consider the 
'squared distance' between prices and returns in the steady state||: 



where (• • • \u) stands for a time-average in the steady state conditioned on the occurrence 
of state uo. Phases in which prices equal returns in all states u, and in which accordingly 
H = are said to be efficient, whereas phases with H ^ are referred to as inefficient. 

A second quantity of interest measures how differently the agents behave upon 
receiving the two different signals, and is defined as follows: 



Here (. . .) stands for a time-average in the stationary state, q measures the extent to 
which agents use the information available to them. Small values of q indicate that the 
investment they make is nearly independent of the observed signal, while the perceived 
signal is strongly correlated with their trading actions for large values of q. 

Before turning to the details of the further analysis, let us briefly summarise the 
static picture of the model as found in [1]. For rj = the system undergoes a phase 
transition from an efficient to an inefficient regime at a critical value a c , which depends 
on the details of the disorder statistics. In the sub-critical phase at a < a c , where H = 0, 
the asymptotic value of q depends on initial conditions. For the game with sophisticated 
agents (77 = 1), instead, no such transition occurs and the system is inefficient for all a. 
Moreover the dynamics is ergodic. Loosely writing q turns out to be larger for the game 
without impact-correction than it is for 77 = 1. Thus, naive agents use the information 
more than sophisticated ones, though the latter have larger gains. 

3. Generating functional analysis 

3.1. Batch dynamics 

The learning rule (8) corresponds to what is known as an 'on-line' model in the context 
of the MG [4], with an explicit dependence on u(£) at each time step I. It is analytically 
convenient to replace this type of updating scheme by one in which an effective average 
over all values of ou(£) G {1, . . . , aN} is carried out at every time step, as in MGs [4]. 
The resulting 'batch' model roughly describes a situation in which propensity updates 
are performed only once every O(Q) time steps, and is defined by 



|| It will turn out that (R(£) — p(£)\u>) = 0(7V~ 1//2 ), so that H as defined above is indeed of order one. 
This is observed in numerical simulations, but will also become manifest in the course of the generating 
functional analysis presented below. 




(10) 




(11) 




(12) 
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Figure 1. Order parameters q and H/a vs. a at fixed R = T = A = 1 for the batch 
and on-line games (77 = 0), for different initial conditions, u icr (0) = au for all i. Open 
symbols are results from simulations of the on-line game, solid symbols correspond to 
the batch process. Numerical simulations are performed at aN 2 = 10 4 and all data are 
averages over 100 samples of the disorder. Measurements are taken over 5 • 10 4 steps 
in the on-line and 7500 steps in the batch games, respectively, preceded by 15 • 10 4 
equilibration steps in the on-line case, and 22500 equilibration steps for the batch 
dynamics. 



where now the price at round £ is effectively a vector of prices, {p u '{£)}, one for each state 
uj: Np u {£) = ^2,i z i^[pj- This modification has turned out to have little effect on the 
steady state properties of conventional MGs [11], but carries the advantage of simplifying 
the analytical theory considerably. Numerical results confirm that the batch and on-line 
versions of the present model are qualitatively and quantitatively very similar as far as 
the order parameters q and H and the breakdown of ergodicity at a c are concerned, see 
Fig. 1. Notice that in the present batch model the learning rate T effectively fixes a 
time-scale, it also has a subtle influence on transients. It is convenient to introduce the 
variables 



Xi(£) 



<re{-i,i} 



y 



w = 5 E 



azi 



Ml 



(13) 



*e{-i,i} 



in terms of which one has z ia (£) = Xi {£) + ayi {£) , and 

r 



u, 



Ui 



+ 



R 



N 



+ 



ON 



y^Ja,k?Zia(£) 



+ h ia (£), 



(14) 



where we have introduced external perturbation fields {hi a (£)} which will be used later 
in order to generate response functions. While the dynamics of the MG with S — 2 
strategies per player can be written in terms of one degree of freedom per agent (the 
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so-called score difference), no such simplification can be made in the present game and 
we have to keep both the {xi(£)} and the {yi(£)}. Writing 5 TT > = (1 + tt')/2 for r, r e 
{ — 1,1}, (14) can be simplified further using the facts that limn^oo(l/Q) ^ w = 
and lim^ 00 (l/fi) ^^S^f — 1/2- Subsequently, one may re-scale time to arrive at 

r To - i 

Uic {t + 1) = Uic {t) + - EP- ^-(*)] + ^ E ^ - 2 Tr]aZta{t) 

E E fc " + m*)> (is) 

where t is now a re-scaled time. (15) is the process we shall consider. 



3.2. The effective-agent process 



The aim of every dynamical theory of disordered systems is to derive a closed set of 
equations for the behavior of macroscopic correlation and response functions in the limit 
N — > oo. We apply the path-integral method first devised for systems with quenched 
disorder by De Dominicis [9]. This technique is based on the computation of the moment 
generating functional 



exp 



i,a,t 



= J Du p(u(0)) exp 



Yl S [equation (15)] , (16) 



t,cr,i 



where i/> = ip + ) are generating source fields and where ((• ■ ■)) denotes an average 
over the process (14) with respect to the distribution p(u(0)) of initial conditions 
u(0) = {u ia (0)} from which the dynamics is started. The shorthand Du stands 
for Yit a i [dui a (t) / y/2n\ . Correlation and response functions can be obtained by 
taking derivatives of the disorder-averaged generating functional Z[ip] with respect to 
the source fields {ipi a (t)} and/or the perturbing fields {h ia (t)} and by subsequently 
considering the limit of vanishing fields. Our ultimate goal is to obtain self-consistent 
equations for these physically relevant observables in the limit N — > oo. 

The calculation can be performed along the following lines. First, one expresses 
the ^-distributions in (16) in their exponential representation (we denote the conjugate 
parameters of the {u ia (t)} by {u ia (t)}). Next, the average over the quenched disorder 
{R} and {k{\ is evaluated. This is done conveniently by introducing the parameters 

« w (0 = -LE^' ^ = 77^ ( 17 ) 



N 



where yi{t) is given by (13) while Wi(t) stands for 

1 
2 



Wi(t) 



r(t). 



(18) 



<T6{-1,1} 
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As usual, disorder- averaging generates macroscopic objects (dynamical order 
parameters). In the present problem, they are given by 

G(*>0 = ^E>(0y*(O, L (M') = ^E>*(*H(0, (is) 

i i 



These definitions can again be inserted into via ^-distributions. Finally, factorising 
the resulting expression over agents and states wherever possible one arrives at the 
familiar form 

ZpJ = J e N ^ + * +T)+0 ^ ) DQDQDKDKDLDLDmDmDcpDtp. (20) 

The functions = ^(m, ip, Q, Q, L,L, K ,K), $ = $(m, <p,Q,L,K) and T = 
T(m, tp,Q, L, K) are given respectively by 

* = 1 [<?(*> OQft + L (t, t')L(t, t') + K(t, t')k(t, t' 



T 



= a log y 



DaDaDbDb exp 



jjs^a, a, 6, 6) 



H^K(O)) 



exp 



-ira^m(t)^), (21) 
t 

(22) 

i^^W^W-i^mW [i?-x(t)j 



t,(T 



x exp 



x exp 



-i £ [§(t, 02/(^)2/(0 + L(t, t')w(t)w(t') + % t')y(t)w(t')] 
t,t> 



t,cr 



1 1 

+ 1) - it ff (i) - /^(f) + + -Tr]az a (t) 



(23) 



with the shorthands Da = \\ t [da(t) / y/2nj (and similarly for Da, Db, Db) and 



Du 



W tcr du cr {t)/y^K (and similarly for Du), and 

S(a,a, 6,6) = i^ [a(t)a(t) + b(t)b(t) + Fa(t)[b(t) + ip(t)} - Tb(i)m(t) 
t 

M*' + L (^ t')b(t)b(t') + 2tf(t, t')a(t)b(t') 



t,t' 

r 2 A 2 



E 6(06(o- 



(24) 



Note that we have assumed that the distribution of initial conditions factorizes over 
agents and the index a and that the distribution of the starting points is identical for all 
agents, so that p(u(0)) = Y\iaPv( u iv(Q))- The integrals in (20) can then be performed 
by the method of steepest descents in the limit iV — > oo. The dominant contributions 
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here come from terms of order N in the exponent, and are contained in $ and T 
as given above. All terms of sub-leading order carry zero weight in the thermodynamic 
limit. The saddle-point conditions can then be worked out by taking derivatives of the 
exponent in (20) with respect to the integration variables. Differentiation with respect 
to rh and (p leads to the relations 

( X (t) - r)^ = o =► £ x; mo>* = a, 



(25) 
(26) 



where (• • denotes an average with respect to the probability measure defined by the 
single-agent process described by T: 

jDuDu [ILMMO))] ■■■M(u,u) e-^t-Wp-W] 



M(u, u) = exp 



[Q(t,t')y(t)y(t') +L(t,t')w(t)w(t') + K(t,t')y(t)w(t') 



(27) 



t,t' 



x exp 



t,a 



u a {t + 1) - u a (t) - h a (t) + -<p(t) + ^Tr]az a (t) 



.(28) 



All order parameters appearing in this measure take their saddle-point values and the 
auxiliary fields tp have been set to zero; also, we assumed hi a (t) = h a (t) for all %. Notice 
that (25) implies that the average investment equals R. We will henceforth set m = 
and use (25) as an additional condition to be satisfied by the solutions. As for (p, its 
physical meaning and value will become clear when the steady state will be worked out 
explicitly in the next section. The saddle-point conditions for m and <p read 

m(f) = <a(f)>, , ¥>(*) = -<&(*)>., (29) 

where 



("•>.= 



f ■ ■ ■ exp 


S(a,a, b, b) 


DaDaDbDb 


/exp 


S(a,a, b, b) 


DaDaDbDb 



(30) 

cp = 0, the integrals 



These two equations admit the self-consistent solution m 
vanishing due to symmetry. 

It remains to compute the saddle-point values of the order parameters {Q, L, K} 
and of their conjugates {Q, K, L}. Extremisation of the exponent in (20) with respect 
to {Q, K, L} gives 

Q(M') = (y(t)y(t% = iE^^MO)*, 



L(t,t') = (w(t)w(t')l = \j2 aT (M*)«r(0>*> 

<T,T 

K^t') = (y(t)w(t>))+ = \^r{ Zu {t)u T {t'))^. 



(31) 
(32) 
(33) 
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By taking derivatives of the generating functional with respect to the fields {■?/>, h} one 
may check that Q and K can be identified with correlation and response functions of 
the original multi- agent system 

i N 

Q(f,o = ,j™ ^Ewowm (34) 



i=l 



^(M') = ilim— Y y r 9 ^fl (35) 

L vanishes identically by virtue of the built-in normalization Z[0] — 1 (see [11]). Finally 
the saddle-point equations corresponding to the integrations over {Q, L, K} read 

QM= 'm^r L{tX ^ [ mry KM=i wm- m 

$ can be evaluated explicitly by successively integrating (22) over the {a(t)}, {a(t)} 
and {b(t}}. Taking the required derivatives one finds that 



where 



Q = 0, L = ~aA, K = -aT(l -iTR)' 1 , (37) 

A = (\-\YKY x D{\-\YK T y l , (38) 
D(t,t') = T 2 [\ 2 + Q(t,t')]. (39) 

Motivated by (35), we shall henceforth set G = —\K. Inserting the saddle-point 
conditions into (28) leads to the effective-agent process 

u a (t + 1) = u a (t) + h a {t) - \(p{t) - y(l + TG)-\t, t')y(t') 

t'<t 

- \TnoLZ {t) + ^k(i), (40) 

where a G { — 1,1} and z a {t) = u a (t)9[u a (t)], while £(t) is a zero- average Gaussian noise 
with temporal correlations given by the covariance matrix 

mm) = Ht,t'). (41) 

Note that the representative agent is here described by a pair of processes, u a (t), 
a G { — 1,1} at variance with the versions of the MG studied so far, where one has 
only process (for the so-called score difference). The one-time function (p and the two- 
time functions Q and G are the dynamical order parameters of the problem, to be 
determined self-consistently according to (31) and (33), where the average (• • is over 
the effective process (40), i.e. over realisations of {£(£)}, subject to the constraint (25). 
As usual, the resulting self-consistent effective agent problem is fully equivalent to the 
original coupled iV-particle dynamics in the thermodynamic limit N — > oo in the sense 
that Q and G are the correlation and response functions of the original batch problem. 
The non-trivial correlation structure of the single-particle noise and the non-Markovian 
term coupling to times t' < t in the effective process (40) are direct consequences of 
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the presence of quenched disorder in the original multi-agent system, and impede a full 
analytical solution for the dynamical order parameters at all times. For models of the 
present type the alternative numerical iteration scheme provided by [12] is restricted 
to O(10 2 ) time steps due to computational limitations. In our case equilibration times 
turn out to be much larger. Note that the constraint (x(t))+ = R (for all times t) is not 
an external one as for example in spherical models [13], but rather it is self-generated 
by the system in the thermodynamic limit. For this reason (p has no direct analogue in 
the original iV-particle problem, but appears only in the effective process. 

3.3. Average price/return fluctuations 

The fluctuations of the difference between price and return are given by 



d(t,t') = lim N(([R(t) - p(t)][R(t') - p(t')})) (42) 

N— >oo 

(note that the above quantity with the explicit pre-factor N turns out to be of order 
one). This is equivalent to 

d(t, t 1 ) = Jim i £ (( [m(f) +R»- a«(t)] [m(t>) + R* - a«(t>)] }}, (43) 

UI 

where we made use of (17). Proceeding as before and integrating by parts over b w (t) 
one sees that 

d(t, t') = Jim ^ £ ((^(t)^(tO)). (44) 

UI 

Anticipating that in the limit N — > oo the behavior of d(t, t') will be dominated by the 
same saddle-point describing ((Z [■»/>])) and at which m = cp = we have 

d( M ') = J_ J DQDQDKDKDLDLDrhDcp e Ar(*+r) +( n-i)*/a + o(VF) 



x DaDaDbDbb(t)b(t') exp i ^ [a(t)a(t) + b(t)b(t) + Fa(t)b(t)^ 

it 

l - J2 [T 2 X 2 b(t)b(t') + Q(t,t')a(t)a(t') + 2K(t,t')a(t)b(t') 



(45) 



x exp 



Integrating over a, a and b we find 

d(t, f) = A(t, t')/T 2 = [(1 + rG)" 1 (A 2 £; + Q)(l + rG T ) _1 ] (t, t'), (46) 

so that indeed d(t, t') as defined above is of order one. E is the matrix with all entries 
equal to one. 

Following similar steps one can calculate the average price- return difference, namely 
A(t) = lim VN PFlf. (47) 

N— >oo 
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One finds 



A(t) = i J DQDQDKDKDLDLDrhDCp e "(*+T)+(n-i)*/«+ofc/jv) 



I 



DaDaDbDb b(t) exp 



[a(t)a(t) + b(t)b(t) + Ta{t)b(t) 



1 

t 



(48) 



x exp 



\ \T 2 \ 2 b(t)b(t') + Q(t,t')a(t)a(t') + 2K(t,t')a(t)b(t') 

This integral vanishes due to symmetry, hence A(t) is a zero-average process with 
temporal correlations d(t, t') of order one. 

4. Ergodic steady states 

4-1- General considerations 

Due to the presence of the coloured noise £ (t) and of the retarded self-interaction in the 
effective process (40) it is in general not feasible to solve the self-consistent system 
{(25), (31), (33)} analytically for all times t,t'. We shall therefore restrict ourselves 
to studying the ergodic steady states of the effective-agent problem. These are time- 
translation invariant solutions, 

lim Q(t + r,t) = Q(t), lim G(t + r,t) = G(t), hm(p(t)=(p, (49) 

t^oo t^oo t^oo 

without long-term memory, 

lim G(t, t') = W finite, (50) 
and with finite integrated response, 

lim VG(r)=: X <oo. (51) 

t— >oo ' J 

T<t 

With these Ansatze one performs a time-average of the effective process (40), leading 
to 

1 1„ aTa/2 Jcl _ 

Ua = ~2 V ~ 2 VaZa ~ 1 + T X y + ~ ^ ' 

Here, we have introduced u a = lim^oo u a (t)/t (roughly representing the 'velocity' with 
which the propensities grow in time), as well as the static variables 

f<t t'<t t'<i 

h a (t) has been set to zero. Note that £ is (static) Gaussian noise of zero mean and 
variance 

(e) = r'(A 2 + 9 ) (54) 
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and that the z a ,a G {—1, 1} as well as y are stochastic variables, coupled to the value 
of £ via Eq. (52). The parameter 

g =lim^Q(r) (55) 

T<t 

represents the persistent part of the correlation function which, together with x an d (p, 
is to be determined self-consistently from 

(x) = R, q = (y 2 ), X= -L(°vy (56) 

Note that, up to a pre-factor, the noise acts effectively as an external field, so that x 
can be expressed in terms of a derivative with respect to £. Recalling (13) one realises 
that q is indeed the observable defined in (11). 

In the stationary state the matrix d(t,t') will also be time-translation invariant, 
and similar to the MG one finds 

A 2 + q 

H = a lim d(r) = a- . „ . (57) 

™ 1 ' (1 + Tx) 2 1 ' 

The magnitude of the fluctuations of the price around its temporal mean is given by 
bp 2 ^a-^iiP- (Pk))» = d ( r = °) - H/a. (58) 

For this quantity one has the exact relation 

Sp 2 = [(1 + rG)-\\ 2 E + Q)(l + TG T )~ 1 ] (0) - H/a. (59) 

A further evaluation of 5p 2 hence requires in principle the full knowledge of the transient 
contributions to the correlation and response functions. While the analogous quantity 
in the MG can be well approximated in terms of persistent order parameters, such an 
estimate appears much more subtle here due to an explicit dependence of the fluctuations 
on the learning rate. Defining Q(t,t') = Q(t,t') — q one has 

Sp 2 = [(1 + TG)- 1 Q(1 + TG^r 1 ] (0), (60) 

where Q is a measure of the fluctuations of the variables yi(t) around their temporal 
averages, and depends on the learning rate. For r — > no fluctuations are present, so 
that bp 2 — > 0, as pointed out in [1]. In general, the right- hand- side is an increasing 
function of T, which is difficult to express in terms of persistent order parameters within 
the present setup. We choose here to focus on the asymptotics of the model. 

In order to proceed with the analysis of (52) one inspects the behaviour of the 
{uiait)} in numerical simulations, and formulates suitable Ansatze for u a , corresponding 
to different types of solutions observed in simulations. For purposes of clarity, we will 
treat the cases rj = and rj > separately in the following. 



Dynamics of adaptive agents with asymmetric information 

io 2 



14 



X/R 



io 1 



io u 



10" 



10" 




io -2 io" 1 10° io 1 io 2 



a 

Figure 2. Phase diagram of the model without impact-correction. Solid line indicates 
the ergodic/non-ergodic (NE) transition at a c , given by Eq. (71), dashed line is the 
transition between the 2- and the 1 + 0- phases at a* , see Eq. (70). 



4.2. r] = 

Simulations of the model without impact-correction reveal the existence of three distinct 
phases, two ergodic ones and a third with anomalous response: 

• For large a greater than a critical value a* both propensities are positive 
on average and remain finite in the steady state for all agents. Each agent 
asymptotically invests finite amounts under both signals. The corresponding 
effective agent has u a = for both a G { — 1, 1}. We shall refer to this phase 
as the '2-phase', indicating that all agents invest under both signals. 

• For intermediate a agents are divided into two groups. Some agents do not invest 
under either signal, both of their propensities decrease linearly with time, so that 
u a < for both a E { — 1,1}. Each of the remaining agents invest under one signal 
(say <7j for player i) but not under the opposite signal — (Tj. For such agents, the 
propensity Ui Gi (t) is positive and remains finite asymptotically, while the other one 
decreases linearly in time. This corresponds to trajectories of the effective process 
with u a = for one value a G { — 1, 1} and U- a < 0. We will refer to this phase 
as the '1 + 0-phase', indicating that some players do not invest under either signal, 
while others invest under precisely one signal. 

• For low a less than a critical value a c one finds that the macroscopic order 
parameters of the steady state depend on initial conditions, see Fig. I. Hence 
the dynamics is non-ergodic, and we expect the integrated response x to be infinite 
in this regime. We will label this phase by NE (non-ergodic) in the following. 

It will turn out that the relevant control parameters for the phase behaviour of the 
model are given by a and the ratio X/R of the mean return over its standard deviation, 
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the learning rate T has no influence on the persistent order parameters in the stationary 
states, but only on the transients of the dynamics. The resulting phase diagram in the 
(a, A/i?)-plane is depicted in Fig. 2 

We will now study the two ergodic phases separately, starting with the phase at 
intermediate values of a, and will compute the persistent order parameters in the two 
ergodic phases, as well as the boundaries a* and a c separating the three regimes. 

4-2.1. The 1 + phase at intermediate a Agents who do not invest under either signal 
correspond to solutions of the effective agent process with 

u a = -u a < 0, z a = y = 0, ere {-1,1}. (61) 

While the specific values of the u a , a e { — 1, 1} will play no role for the further analysis, 
we would like to stress that different realisations of the effective process (that is different 
realisations of the noise £) can in general lead to different values for the u a ,a G { — 1,1} 
and for the z a . For agents who invest under one signal, but not under the other, we will 
inspect solutions of the effective process of the following two types: 

(«+,«_) = (0,-«) (*+,*_) = (2y,0) y>0, (62) 

(«+, u-) = (-«, 0) (z+, = (0, -2y) y < 0. (63) 

The former corresponds to agents who invest under signal a — 1, the latter to agents 
who play upon receiving a = — 1. In both cases, u takes a positive value, which again 
in principle may vary for different realisations of the effective process. In either case 
summing the two relations (52) (with 77 = 0) leads to 

u — (p. (64) 

In particular u takes the same value for all (effective) agents who invest under exactly 
one signal. Self-consistency demands ip > as we have assumed above that u is positive. 
Taking the difference of the two equations of (52) we find 

i + r x 

where the plus signs describes the case y > while the minus sign holds when y < 0. 
Setting = ip/ 1 \fot one sees that the former case is realized when £ > £*, the latter when 
£ < — £*. For |£| < no solution with \y\ > is possible and the corresponding effective 
agent never invests, as discussed above. We conclude that the physical interpretation 
of (p is closely related to the relative weight of the two types of agents. Indeed, the 
probability that an agent is inactive, that is the fraction of agents who do not invest 
under either signal, is given by 

0o = {0(C ~ |ei)> = erf (^=), (66) 



-ip ± (65) 



where 7 = (£ 2 ), see (54). The persistent order parameters ip, q and x are obtained from 
the self-consistency relations 

z+ + z^\ — , o\ 1 /dy\ 
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which may be written as 

rRy/a 



1 + 
aTq 

(1 + Txf 
aTx 



((e - cm - e*)> + ((-e - m-e - t)> , 
= <(e - o 2 *(£ - o> + <(-e - o 2 *(-£ - o> , (68) 
(^-o> + w-e-r)>- 



After carrying out the remaining integrations over £ one finds 



■exp eric 



^W^-W^-pf"^ 1 ). (69) 



l + Tx) V « / \V2tt7 / V 27ra V 2a 7 



i + r x Vv^, 

This coupled system of non-linear equations is easily solved numerically (for example 
using Newton- Raphson methods), and the order parameters q,x and ip may be 
obtained as functions of a for any fixed values of the model parameters R, A and 
T. The dependence of these persistent order parameters on the learning rate T can be 
understood by an inspection of (69). One finds that the solution is T-independent when 
expressed in terms of the re-scaled variables {q,Tx,<p/T}. H can in turn be obtained 
from (57). 

This solution is valid self-consistently as long as x turns out to be finite, and as 
long as ip comes out positive. The point at which the latter condition breaks down is 
easily determined upon setting if — in the above coupled set of equations. After some 
algebra one finds that this occurs at 

2 A 2 

a = a* = 1 + (70) 

which coincides with the static result [1]. The onset of anomalous response, i.e. the 
point a c at which x diverges at fixed values of A, T and R is found to be determined by 
the condition^ 

a c = 1 - (j> (a c ). (71) 
a c is obtained as a c = erfc(C c ) where ( c is the root of 

R = e-^/v^-Cerfc(C) 

A VCe-C7Vi-C 2 erfc(0' 
Note that due to Eq. (57) H vanishes at the point of diverging x- Hence the 
dynamical phase transition between the ergodic and non-ergodic regimes coincides with 
the transition between efficient (H = 0) and non-efficient (H > 0) phases observed 

% Note that a very similar condition a c = 1 — <f>(a c ) has been found in the context of the Minority 
Game [11]. There, <f> denotes the fraction of so-called frozen agents. 
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in [1]. One finds numerically that a c < a* for all fixed values of the parameters A, T, R, 
so that we conclude that the 1 + 0-phase is physically realised for intermediate values 
of a G [a c , a*]. 

4-2.2. The 2-phase at a > a* . Here we set u c — for both a G { — 1, 1} in (52). 
Summing the resulting expressions for o = 1 and a = — 1 one immediately finds ip = 
for r] = 0. Taking the difference, instead, yields 

This in turn implies that 

from which we can directly read off the value of q in the 2-phase: 

9 = -^r, (75) 
a — 1 

in agreement with the corresponding static result given in [1]. For the susceptibility one 
obtains 

* = 7s(W = n^Ty (76) 

Eqs. (75) and (76) along with our result tp = completely describe the persistent order 
parameters in the ergodic steady states at a > a*. Note that a* > 1 by virtue of (70), 
so that no singularities occur in the 2-phase. Using Eq. (57) H is given by 

H = \ 2 {a - 1) (77) 

for a > a*. 



4-2.3. Comparison with simulations We have tested our theoretical predictions for 
the game without impact-correction against numerical simulations. Results for H and 
q are presented in Fig. 3, while Fig. 4 shows the behaviour of 0o- All simulations 
are performed on the on-line update rules (given by Eq. (8)) with aN 2 = 10 4 . 
Measurements are taken over 50000 time-steps preceded by 150000 equilibration steps. 
All data presented are averages over 100 samples of the disorder. The learning rate is 
kept fixed at T = 1 (we have verified the independence of q and H of the learning rate in 
separate simulations). The figures demonstrate very good agreement of the theoretical 
predictions with the numerical data, modulo finite-size effects close to the transition 
points. We observe that q is a decreasing function of a in the two ergodic phases, with a 
cusp at the transition point between the 1+0- and the 2-phase at a = a*. The breakdown 
of ergodicity below a c can be illustrated by starting the dynamics from differently biased 
initial propensities. While the macroscopic order parameter q is insensitive to initial 
conditions above a c , the starting point becomes relevant in the non-ergodic phase, as 
shown in Fig. 1. 




Figure 3. Order parameters q and H/a as functions of a for fixed 77 = 0. Markers are 
from simulations of the on-line model for A = 0.5 (circles), A = 1 (squares) and A = 2 
(diamonds), started from unbiased initial conditions. We set T = 1 for all three curves. 
The solid lines are the predictions of the analytical theory and have been continued 
as dashed lines into phases where they are no longer valid. The vertical lines indicate 
the analytically obtained locations of the ergodic/non-ergodic phase transition at a c 
for the three different values of A = 0.5, 1, 2 (from right to left). 

1 1 T 1 — III ll| 1 1 1 1 — I — I I I 




Figure 4. Fraction 0o of agents who do not invest under either signal for fixed 77 = 0. 
Markers are from simulations of the on-line model for A = 0.5 (circles), A = 1 (squares) 
and A = 2 (diamonds), started from unbiased initial conditions. We set r = 1 for all 
three curves. The solid lines are the predictions of the analytical theory in the phase 
of intermediate a. Outside this phase <pa is predicted to be zero. The vertical lines 
indicate the analytically obtained locations of the ergodic/non-ergodic phase transition 
at a c for the three different values of A = 0.5, 1, 2 (from right to left). 
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4.3. r]>0 

For 77 > the situation is slightly more complicated. One observes at all a that agents 
are divided in two classes: those who trade under both signals and those who trade at 
most under one signal. Let us discuss this scenario in detail. The former have 

u a = 0, z a > 0, a G {-1, 1}. (78) 

Notice that for these agents x = (z + + z_)/2 > \y\ = \{z + — zJ)/2\. For the latter one 
has instead (62) and (63) as before. For them, x = y when y > and x = — y when 
y<0. 

Let us start with the traders who always invest. Summing equations (52) with 
non-zero 77 and u„ = one gets 

tp = —Tr/ax, (79) 

whereas taking the difference gives 



The requirement \y\ < x now translates into the condition 



(80) 



on the effective noise. Note that tp < by virtue of (79). Turning to agents who trade 
under one signal only, summing equations (52) with non-zero 77 one finds that (64) and 
(65) generalize to 

u = tp + Tr]a\y\, (82) 

where the minus (resp. plus) sign holds for agents with y > (resp. y < 0). Let us 
focus on agents with y > 0. Combining (82) and (83) one gets 

v= 7 +v ? v (84) 

On the other hand, since u > we must have 



£>v / ^77 + TTT -j, (85) 
which via (84) becomes 

^>-^(» + T^)- (86) 

Similarly one finds that the solution with y < corresponds to 
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Figure 5. Order parameters q and H/a for r\ > as functions of a at fixed values of 
r = A = 1. Markers are from simulations of the on-line model for r/ = 0.05 (circles), 
r] = 0.25 (squares), i) — 0.5 (diamonds) and r\ = 0.75 (triangles), started from unbiased 
initial conditions. The solid lines are the predictions of the analytical theory in the 

1 + 2 phase. 

Hence, denning £ = — {r\ + j^^j, the fraction of players who invest under both 
signals can be written as 

02 = <0(e-iei)>, (ss) 

where (...) is again an average over £ (with variance given by (54)). The saddle-point 
conditions (56) take the form: 



^faTR 



aT 2 q 



aTx = 



r]\fa 

(eon 



(on -m + 



<(e-me-o>-<(e+m-e-o> 



[r ] + l/(l + T X )} 2 

+ 



+ 



2r? + 1/(1 + r x ) 

-o> + <(£+em-e-o> 



[277 + i/(i + r x )] : 



(89) 



77 + i/(i + r x ) 277 + 1/(1 + r x )' 

where as before = </?/ ^/a. We do not report the lengthy expressions one obtains after 
the averages over £ are carried out, but would like to stress that one readily checks that 
these equations do not allow for a diverging susceptibility at any finite a. Therefore the 
model with impact-correction does not exhibit anomalous response, in contrast with the 
game at 77 = 0. From (89) the order parameters ip, q and x can be obtained numerically 
as functions of a for any fixed values of 77 > 0, A and T, and near perfect agreement is 
found with numerical simulations, see Fig. 5 and 6. Observing no systematic deviations 
from the theory, we have no reason to suspect an onset of long-term memory at finite 
X, and hence the physical picture of the present model at 77 > appears different from 
the behaviour of the MG with impact correction [14]. 
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Figure 6. Fraction <j>2 of agents who invest under both signals, shown for different 
values of r\ > 0. Markers are from simulations of the on-line model for r\ = 0.05 
(circles), r\ = 0.25 (squares), rj = 0.5 (diamonds) and rj — 0.75 (triangles), started from 
unbiased initial conditions and with T = 1, A = 1. The solid lines are the predictions 
of the analytical theory in the 1 + 2 phase. 



5. Concluding remarks 

We have presented an analysis of the dynamics of a system of adaptive agents with 
private asymmetric information, complementing and extending the study of the statics 
of the model previously presented in [1]. To this end we have devised a batch version of 
the original on-line update rules and observe no significant effects on the persistent order 
parameters in the stationary states. This demonstrates that the replacement of the on- 
line dynamics by an information-averaged batch process as successfully performed in the 
context of the MG can be extended to other models of interacting agents. Path integral 
methods can then be used to turn the coupled batch dynamics of the N interacting 
agents into an effective single-agent problem in the limit N — > oo. From this effective 
process the persistent order parameters in the different ergodic stationary states as well 
as the phase diagram can be computed exactly and in agreement with the static results 
obtained via replica techniques. For the model without impact-correction (rj = 0) 
we find three different phases, two ergodic ones and a phase with broken ergodicity 
and dependence of the stationary macroscopic order parameters on initial conditions. 
The location of the onset of ergodicity breaking, a c , coincides with the location of the 
transition between efficient and non-efficient phases identified in [1]. For sophisticated 
agents (rj > 0) only one phase is present, and no ergodicity breaking occurs. The 
generating functional approach also allows to address the issue of (average) fluctuations 
of prices around the (quenched) returns, and an exact relation to the dynamical order 
parameters can be drawn. The computation of H does not require the knowledge of 
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the transient contributions of the correlation and response functions, but only of their 
persistent parts (which we can compute exactly). On the other hand full solutions of the 
self-consistent effective problem for the functions C(r) and G(r) are needed to calculate 
the fluctuations of p(t) around R(t). While the corresponding fluctuations in the MG 
(the so-called volatility) can be well estimated in terms of persistent order parameters, 
similar approximations appear to be much more delicate in the present model. In 
addition we find that the volatilities of the batch and on-line models are different, 
which is not is not surprising as they depend on transients in C and G. Moreover a 
dependence of the magnitude of fluctuations on the learning rate has been reported 
in [1] . An analytical study of this dependence is beyond the scope of the present paper; 
it appears that these issues are more effectively tackled in suitably simplified versions 
of the model of [1], work on which is in progress. 

In conclusion the dynamical mean field theory extensively used in the context of 
the MG with common public information can be extended to models of interacting 
agents with asymmetric non-uniform information. The present model can up to now 
presumably at best be seen as a most simplistic model of a financial market. Possible 
extensions include models with heterogeneous learning rates Tj, such models are of 
interest both from the mathematical point of view as they would lead to an ensemble 
of effective processes similar to [15,16], but would also allow to study the interaction 
and relative success of agents with different abilities of adaptation. In the same realm 
the individual wealth of the agents could be taken into account, each varying in time 
according to the performance of the agents. It might also be worthwhile studying the 
influence of decision noise on the phase diagram. Another presumably most interesting 
extension of the model would be one in which the information available to the agents 
is not only asymmetric, but also noisy. Finally, in the present model the 'states of the 
world' uj are drawn at random at each time step. With techniques now available to 
study Minority Games with real histories [4], an attempt might be made to replace this 
external random signal by endogenously generated pieces of information relating to the 
previous history of the market. The state io{t) at a given time step t might for example 
encode the prices p(t — l),p(t — 2), . . . ,p(t — M) of the previous M steps or relate to 
the history of differences between prices and returns. Work along some of these lines is 
currently in progress. 
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